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ABSTRACT 

Trans-splicing in trypanosomes adds a 39-nucleotide 
mini-exon from the spliced leader (SL) RNA to the 
5' end of each protein-coding sequence. On the 
other hand, c/s-splicing of the few intron-containing 
genes requires the U1 small nuclear ribonucleopro- 
tein (snRNP) particle. To search for potential new 
functions of the U1 snRNP in Trypanosoma brucei, 
we applied genome-wide individual-nucleotide reso- 
lution crosslinking-immunoprecipitation (iCLIP), fo- 
cusing on the U1 snRNP-specific proteins U1C and 
U1-70K. Surprisingly, U1C and U1-70K interact not 
only with the U1, but also with U6 and SL RNAs. 
In addition, mapping of crosslinks to the c/s-spliced 
PAP [poly(A) polymerase] pre-mRNA indicate an ac- 
tive role of these proteins in 5' splice site recogni- 
tion. In sum, our results demonstrate that the iCLIP 
approach provides insight into stable and transient 
RNA-protein contacts within the spliceosomal net- 
work. We propose that the U1 snRNP may repre- 
sent an evolutionary link between the c/s- and trans- 
splicing machineries, playing a dual role in 5' splice 
site recognition on the frans-spliceosomal SL RNP 
as well as on pre-mRNA c/s-introns. 

INTRODUCTION 

Pre-mRNA splicing, an essential step between transcription 
and translation of most eukaryotic mRNAs, is catalyzed by 
a macromolecular complex termed the spliceosome. Con- 
sisting of small nuclear ribonucleoproteins (snRNPs) and 
other protein components, the spliceosome assembles in a 
stepwise manner on the precursor mRNAs. In Trypanosoma 
brucei, the expression of protein-coding genes, in particu- 
lar the mRNA-processing stages, differs in several respects 
from other eukaryotes: protein-coding genes are organized 
in long polycistronic transcription units, and mRNA matu- 
ration requires coupled trans-splicing and polyadenylation 



steps. During rra^^-splicing, the spHced leader RNA (SL 
RNA), which is a constituent of the SL RNP, adds the 39- 
nucleotide mini-exon from its 5^ end to every protein-coding 
pre-mRNA, thereby generating SL-capped mRNAs. In ad- 
dition to the SL RNP, the U2, U4/U6 and U5 snRNPs are 
essential ^raw^-splicing factors (for review, see (1)). 

As confirmed by recent genome-wide studies (2,3), only 
two genes with intronic sequences were identified in T. bru- 
cei, coding for PAP [poly(A) polymerase; Tb927.3.3160] 
and a putative, ATP-dependent DEAD/H RNA helicase 
(Tb927.8.1510). Both contain a single intron, which is re- 
moved by c/^-splicing. Although these are likely the only 
two c/^-introns in T. brucei, this explains the existence of a 
Ul snRNP in trypanosomes: c/^-splicing requires the recog- 
nition of the 5^ splice sites through base-pairing between the 
Ul snRNA and the 5^ splice site on the pre-mRNA (4). The 
trypanosome Ul snRNP is unusual in several aspects: its 
three specific protein components, U1-70K, UlC and UlA, 
are only distantly related to their known counterparts from 
other eukaryotes; in addition, U1-24K was characterized as 
a trypanosomatid-specific Ul snRNP protein, which is sta- 
bly integrated into the Ul snRNP by protein-protein in- 
teractions (5,6). Interestingly, the trypanosome Ul snRNA 
with 75 nucleotides represents one of the smallest known 
snRNAs, and lacks a stem-loop II element, which in other 
orthologs contains the well-characterized UlA binding site. 

Does the Ul snRNP function in trypanosomes only in 
c/^-sphcing of two introns, or are there additional func- 
tions beyond splicing? In other systems, in particular the 
mammalian system, there are several lines of evidence for 
splicing-independent roles of the Ul snRNP and its protein 
components, which seem plausible based on the relatively 
high abundance of the Ul snRNP: 

First, the Ul snRNP was found to be recruited to intron- 
less genes (7). Second, the Ul -specific protein UlA inhibits 
polyadenylation of its own and other pre-mRNAs by inter- 
acting with polyadenylation factors (8,9). Third, a genome- 
wide study demonstrated that the Ul snRNP can protect 
pre-mRNAs from premature cleavage and polyadenylation 
by binding to cryptic 5^ splice sites (10,11). Fourth, the Ul 
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snRNP-specific protein UlC plays a peculiar role beyond 
constitutive splicing. Specifically, the efficient assembly of 
the Ul snRNP on the 5^ splice site requires UlC, by stabi- 
lizing base-pairing between the 5^ end of the Ul snRNA and 
the 5^ splice site region (12-16), a role that appears to be Ul 
snRNA-independent (17). Moreover, a recent global RNA- 
Seq study revealed a novel role of UlC during alternative 
splicing, primarily during 5^ splice site recognition (18). 

In line with these studies, tandem-affinity purification of 
UlA in trypanosomes identified a large collection of cop- 
urifying factors, among them the polyadenylation factor 
CPSF73, suggesting a role of UlA in coupling 3^-processing 
and splicing (19). 

Recently, four independent RNA-Seq analyses provided 
evidence that alternative splicing and polyadenyla- 
tion are more common in trypanosomes than previously 
thought, raising the question how these mRNA-processing 
steps in trypanosomes are regulated (3,20-22). 

To obtain more insight into known and novel reg- 
ulatory functions of the Ul snRNP in trypanosomes, 
we have adapted the individual-nucleotide resolution 
crosslinking-immunoprecipitation (iCLIP) approach (23), 
combined with deep-sequencing, to the trypanosome sys- 
tem. Genome-wide mapping of the crosslinks generated a 
comprehensive map of UlC- and U1-70K RNA-interaction 
sites: not only the Ul snRNA, but, surprisingly, also the 
SL RNA and the U6 snRNA are prominent targets of UlC 
and U1-70K. In addition, mapping of UlC crosslinks to 
the c/^-spliced PAP pre-mRNA indicate an active role of 
these Ul snRNP proteins in 5^ splice site recognition. Taken 
together, our results demonstrate that the iCLIP approach 
allows insight into stable and transient RNA-protein con- 
tacts within the spliceosomal network. We propose that the 
Ul snRNP may represent an evolutionary link between the 
cis- and ^raw^-splicing machineries, playing a dual role in 
5^ splice site recognition on the SL RNP as well as on pre- 
mRNA c/^-introns. 

MATERIALS AND METHODS 

Cell culture and extract preparation 

For the generation of cell lines expressing PTP-tagged UlC 
and U1-70K, the pC-PTP-Neo vector including the ORF 
of TbUl-70K (nts 88-831) was used (24). For UlC, the 
ORF (nts 13-582) of T. brucei UlC was PCR-amplified 
and inserted in-frame into the pC-PTP-NEO vector up- 
stream of the PTP tag sequence, using Apal and NotI re- 
striction sites. For genomic integration, 10 ixg of linearized 
pC-PTP-constructs were transfected into procyclic T. bru- 
cei All and cloned by limiting dilution in the presence of 
G418 (40 fxg/ml Geneticin; Gibco-BRL). 

Cell culture of T. brucei All and 29-13, was described 
previously (24,25). Cell ly sates were prepared in extraction 
buffer (500 mM KCl, 20 mM Tris-Cl, pH 7.7, 3 mM MgCls, 
0.5 mM DTT), containing a Complete Mini, EDTA-free 
protease inhibitor cocktail tablet (Roche), using a Dounce 
homogenizer followed by sonication. Cell lysates were sup- 
plemented with 0.1% Tween-20, and centrifuged twice at 
14 000 rpm for 15 min to remove aggregates. 

For starvation experiments, cells (logarithmic phase) 
were collected, washed twice in phosphate-buffered sahne 



(PBS), resuspended in the original volume of PBS, incu- 
bated at 27° C for 90 min, and then returned to pre-warmed 
SDM-79 and incubated at 27° C. 

Immunofluorescence 

The cellular distribution of UlC-PTP by indirect im- 
munofluorescence was analyzed as described (26). 

iCLIP-Seq 

Three (UlC-PTP) and two (U1-70K) biological replicates 
of iCLIP experiments were performed for each of the sta- 
ble cell Hues. Trypanosoma brucei All wild-type (WT) cells 
served as a negative control in each replicate. The iCLIP 
procedure was performed as described by Konig et al (23), 
with minor modifications (see below), and combined with 
tandem-affinity purification (24). 5 x 10^ procylic T. bru- 
cei cells were irradiated with UV-C light (3 x 300 mJ/cm^). 
Lysates were prepared in 4 ml extraction buffer (500 mM 
KCl, 20 mM Tris-Cl, pH 7.7, 3 mM MgCls, 0.5 mM DTT) 
using a Dounce homogenizer (25 strokes with a type B pes- 
tle) followed by sonication. Extracts were cleared by cen- 
trifugation at 14 000 rpm for 30 min and subsequently, 1 ml 
of cleared extract was subjected to combined DNase treat- 
ment (TURBO^^ DNase, Ambion, at a final concentration 
of 4 U/ml), and limited RNase digestion (RNase I, Am- 
bion, at a final concentration of 0.01 U/ml), for 3 min at 
37°C. Lysates were centrifuged at 14 000 rpm for 30 min 
to remove aggregates. The iCLIP library preparation steps 
were exactly performed as described by Konig et al (23), 
except of the tandem-affinity purification steps. In brief, 
UlC- or U1-70K RNA-protein complexes were purified 
by applying the first step of tandem-affinity purification 
(IgG Sepharose 6 Fast Flow, GE Healthcare), followed by 
phosphatase treatment, ligation of an RNA adapter at the 
y ends of the RNA tags (T4 RNA ligase; Thermo Scien- 
tific) and radiolabeling using polynucleotide kinase treat- 
ment to allow visualization of covalent RNA-protein com- 
plexes. By tobacco-etch-virus (TEV) protease bound mate- 
rial was released from the beads, followed by the second 
affinity step (anti-protein C immunoaffinity purification). 
Purified RNA-protein complexes were subjected to sodium 
dodecyl sulfate-polyacrylamide gel electrophoresis (SDS- 
PAGE), followed by electro-blotting. Complexes were then 
recovered by proteinase K treatment. cDNA was generated 
by reverse transcription (Superscript III; Life Technolo- 
gies), using oligonucleotides, which introduce a 5^-barcode 
as well as a BamHI restriction site. cDNAs obtained were 
size-fractionated by denaturing polyacrylamide gel elec- 
trophoresis, circularized (Circligase II, Epicentre), annealed 
to an oligonucleotide complementary to the BamHI re- 
striction site, and cut between the two adapter regions by 
BamHI. Linearized molecules were then PCR amplified 
(27-32 cycles), using primers with sequencing adapters. 

UlC iCLIP libraries were sequenced either on an Illu- 
mina GAIIx (U1C_1, U1C_2; 105-bp single-end reads) and 
or on an Ion Torrent PGM (U1C_3; single-end reads with 
diverse lengths); U1-70K iCLIP libraries were sequenced on 
the niumina MiSeq (U1-70K_1, U1-70K_2; 50-bp single- 
end reads). 
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The sample- and random-barcode sequences were re- 
moved from the 5^ end, followed by linker sequence trim- 
ming at the y end. Trimmed sequence reads with a min- 
imum length of 15 bp were aligned to the T. brucei 
All genomic sequence (Tbrucei427Genomic_TriTrypDB- 
4.2.fasta; see http://tritrypdb.org). The gene annotation 
file (Tbrucei427_TriTrypDB-3.3.gff; see http://tritrypdb. 
org) was used for functional analysis. Reads mapped to tR- 
NAs and rRNAs were excluded, and only uniquely mapped 
reads were selected as iCLIP tags for crosslink-site analysis 
(for details, see (23)). The raw data containing all sequence 
reads as well as the processed data containing all barcode- 
filtered tag counts of crosslink sites in the Tb427 genome 
and in SL, Ul and U6 RNAs were deposited (NCBI GEO 
database: GSE43848). 

RNA interference (RNAi) silencing of UlC expression and 
real-time RT-PCR 

The RNAi construct pLEWlOO-UlC was made, using the 
stem-loop vector pLEWlOO according to an established 
cloning strategy (27). The resulting construct was linearized 
with SacII, and 10 |xg were transfected into T. brucei 29- 
13 cells by electroporation. Transformants were cloned by 
limiting dilution in the presence of G418 (15 [xg/ml), hy- 
gromycin (50 ixg/ml) and phleomycin (2.5 (xg/ml). RNAi 
was induced by the addition of 1 |xg/ml of doxycycline. 
Cells were counted every day and diluted to 2 x 10^ 
cells/ml. Semiquantitative as well as quantitative real-time 
RT-PCR were performed as described (26). 

RNA analysis 

RNA extraction, northern blot analysis and silver stain- 
ing were performed as described (26). For protein-RNA 
crosslinking, formaldehyde (at a final concentration of 1%) 
was added to 5 x 10^ cells in 20 ml SDM-79 medium, in- 
cubated for 20 min, and fixation was quenched by the addi- 
tion of glycine (125 mM) for 5 min, while rotating at room 
temperature. Cells were washed in 1 x PBS, followed by ex- 
tract preparation as described above. For pulldown assays 
via FTP tag, cell extracts were incubated at 4°C with 25 fxl 
packed IgG Sepharose 6 Fast Flow beads (Invitrogen), equi- 
librated in IFF- 150 buffer (150 mM KCl, 20 mM Tris-Cl, 
pH 7.7, 3 mM MgCl2, 0.5 mM DTT, 0.1% Tween-20). Af- 
ter washing with the same buffer (or with IFF-500, which 
contains 500 mM KCl), coselected RNAs were released by 
proteinase K buffer treatment and analyzed by RT-FCR. 
Amplification products were analyzed by agarose gel elec- 
trophoresis. 

Antibodies and immunoprecipitation analysis 

The open reading frame of T. brucei UlC was FCR am- 
plified from genomic DNA and cloned into pGEX-6F-2. 
Recombinant proteins were expressed with an N-terminal 
glutathione ^S-transferase (GST) tag in Escherichia coli 
BL21(DE3)pLys and purified by glutathione affinity chro- 
matography on an AKTApurifier high-pressure liquid chro- 
matography system (GE Healthcare). Furified proteins were 
then used to immunize rabbits (SeqLab, Germany). The 



resulting immune sera were depleted of GST-reactive an- 
tibodies with immobilized GST and affinity-purified, us- 
ing recombinant expressed GST-UIC. Anti-TbUl-70K an- 
tibodies were described previously (5), and immunoprecip- 
itations were performed according to Jae et al (28). 

RESULTS 

Nuclear localization of J. brucei UlC 

To investigate cellular localization and genome-wide RNA 
binding of the T. brucei UlC protein, we first generated a 
clonal procyclic cell fine, which stably expresses UlC with a 
C-terminal FTF tag, consisting of two protein A epitopes, a 
TEV-cleavage site and a protein C epitope (24). UlC expres- 
sion was monitored by western blot analysis, and the cellu- 
lar distribution of UlC was characterized by indirect im- 
munofluorescence (Figure 1). UlC-FTF predominantly lo- 
calizes to the nucleus, with only minor staining of the cyto- 
plasm (Figure IB), consistent with other spliceosomal com- 
ponents in T. brucei (e.g. see (26,29)). 

In vivo RNA-crosslink sites of UlC and U1-70K reveal a po- 
tential physical link between cis- and ^m/i*-spliceosomal com- 
ponents 

To search for potential new functions of the trypanosome 
Ul snRNF, we next identified RNA interactions of the 
UlC protein: we adapted iCLIF in combination with deep- 
sequencing [iCLIF-Seq; (23)] to the trypanosome system 
(Figure 2A). For comparison, iCLIF-Seq was performed in 
parallel with U1-70K, another Ul snRNF-specific protein 
component. We made use of the highly efficient, two-step 
affinity purification of RNA-protein complexes, based on 
a FTF-tagged, stably expressed protein. Briefly, after UV- 
mediated in vivo crosslinking (Figure 2A), cell lysis, limited 
RNase digestion and the first step of the tandem-affinity pu- 
rification (steps #1-3), we performed the subsequent steps 
of the iCLIF procedure on beads, including phosphatase 
treatment, 3^-RNA linker ligation and polynucleotide ki- 
nase treatment (steps #4-6). Next, we used TEV protease 
to release bound material from the beads (step #7) and ap- 
plied the second step of the tandem-affinity procedure, anti- 
protein C immunoaffinity purification (step #8). Furified 
RNA-protein complexes were subjected to SDS-FAGE, fol- 
lowed by electro-blotting (step #9). Complexes were then 
eluted by proteinase K treatment (step #10). After reverse 
transcription (RT) and cDNA size selection (steps #11-12), 
cDNAs were circularized and BamHI-linearized, followed 
by FCR addition of sequencing adapters (steps #13-14) and 
sequencing (step #15). 

Five independent iCLIF experiments, three for UlC and 
two for U1-70K, were performed and sequenced by Illu- 
mina GAIIx, Illumina MiSeq and Ion Torrent FGM [see 
Figure 2B for a summary of the three UlC (left) and the 
two U1-70K experiments (right); for separate analyses of 
each individual experiment, see Supplementary Figure SI]. 
This yielded a total of ~1.4 milHon single-end sequence 
reads. Following barcode removal and 3^ linker trimming, 
-900 000 sequence reads (-728 000 for UlC, -169 000 for 
U1-70K) with a minimum length of 15 bp were aligned to 
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Figure 1. Expression and nuclear localization of Trypanosoma brucei UlC-PTR (A) Expression of UlC-PTP was controlled by western blotting with 
polyclonal antibodies against the protein A epitope of the FTP tag, comparing wild-type (WT) and UlC-PTP expressing cells (UlC-PTP). Protein size 
markers in kDa. (B) T. brucei cells stably expressing UlC-PTP were fixed and stained with DAPI (DAPI). In parallel, PTP-tagged UlC was detected by 
anti-protein A primary antibody and Alexa-Fluor-48 8 -coupled secondary antibody (UlC-PTP). In addition, merged views are shown ('merge': DAPI and 
UlC-PTP staining; 'merge with brightfield'). 



the Tb427 genome. Uniquely mapped sequence reads, ex- 
cluding those mapped to tRNAs and rRNAs were selected 
as iCLIP tags for downstream analysis (for details, see 'Ma- 
terials and Methods' section). Due to the multi-copy array 
of the SL RNA locus, the sequence reads were aligned sepa- 
rately to the 139-nucleotide SL RNA sequence to determine 
the iCLIP tags on this RNA. 

In summary, -305 000 tags for UlC and -95 000 tags 
for U1-70K were selected to identify the RNA-binding sites 
for the two Ul snRNP proteins on a genome- wide level. In 
addition, there were -51 000 UlC and -14 000 U1-70K 
tags on the SL RNA. 

In each of these five experiments, crosslink sites were most 
abundant in the Ul snRNA, representing 74% (UlC) and 
89% (U1-70K) of the total uniquely mapped tags. To ex- 
amine the crosslink-site profile on a single-nucleotide res- 
olution, random-barcode-filtered tag counts were plotted 
on the Y-axis for each Ul snRNA nucleotide position (X- 
axis). Figure 3 A shows the Ul snRNA profile derived from 
the sum of three UlC and two U1-70K iCLIP experiments. 
Supplementary Figure S2 shows the profiles for the indi- 
vidual replicate experiments, indicating that the iCLIP tag 
profiles for the Ul snRNA are highly reproducible. 

For UlC, —40% of the crosslink sites map in the 5^ ter- 
minal region (nucleotides 1-9; Figure 3A), consistent with 
the known binding site of UlC (5). The remaining sites 



distributed throughout the Ul snRNA sequence, with four 
peaks at positions 15, 39/40, 50 and 53. Surprisingly, the 
crosslink profiles for UlC and U1-70K closely resemble 
each other, except for a Ul-70K-specific peak at positions 
23/24 (Figure 3 A). This is exactly in the central loop of Ul 
snRNA, which contains the highly conserved U1-70K bind- 
ing site AUCACGAA (nucleotides 20-27), confirming our 
earlier in vitro binding data (5). To investigate whether the 
three downstream peaks reflect additional interactions of 
U1C/U1-70K within the Ul snRNA or crosslink sites of 
the other Ul snRNA-associated proteins, we performed an 
additional control: limited RNase digestions indicated that 
the Ul snRNA stayed largely intact at the RNase concen- 
trations applied during iCLIP, whereas other snRNAs were 
already partially degraded (data not shown); this suggests 
that the T. brucei Ul snRNP with its unusually short Ul 
snRNA (75 nucleotides) is highly compact and relatively 
stable to RNase digestion (see also 'Discussion' section). 

There is also a surprisingly high number of crosslinks in 
the SL RNA: about 50 000 for UlC and -14 000 for Ul- 
70K (Figure 2B). As seen for the Ul snRNA, the crosslink 
site profiles of UlC and U1-70K on the SL RNA revealed a 
very similar distribution (Figure 3B), in particular around 
the 5^ splice site (triple peak at positions 36-43), suggest- 
ing the entire Ul snRNP engages in the recognition or ac- 
tivation of the SL RNA 5^ splice site. There is a second. 
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Figure 2. Genome-wide mapping of UlC and U1-70K RNA-protein interactions in trypanosomes by iCLIP-Seq: strategy and statistics. (A) Schematic 
overview of the iCLIP-Seq approach, as adapted for Trypanosoma brucei cell lines stably expressing PTP-tagged RNA-binding proteins (here: UlC). For 
a detailed description, see 'Results' section. (B) Summary of distribution of UlC and U1-70K iCLIP tags. The numbers of sequence reads, of uniquely 
mapped reads for the Tb427 genome and of the separately aligned SL RNA tags represent the sum of three (UlC) or two (U1-70K) biological replicates 
(for the numbers of the individual experiments, see Supplementary Figure SI). The pie charts show the distribution of uniquely mapped reads in snRNAs 
(Ul, U2, U4, U5 and U6) and genomic regions other than snRNA and SL RNA loci ('others'). 
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Figure 3. Crosslink site profiles of Trypanosoma brucei VIC and U1-70K on the Ul , SL and U6 snRNAs. (A-C) The numbers of random-barcode-filtered 
iCLIP tag counts for UlC (red line) and U1-70K (blue line) iCLIP tags (crosslink sites) on the Ul, SL and U6 snRNAs are plotted in single-nucleotide 
resolution. Only truncated versions of the entire RNA sequences without the last 1 5 nucleotides are shown (Ul snRNA: nucleotides 1-60; SL RNA: 1-123; 
U6 snRNA: 1-82), since for technical reasons iCLIP tags further 3' cannot be mapped. Schematic models of the secondary structures are depicted for each 
RNA. (A) iCLIP profiles on the Ul snRNA. The U1-70K (red) and the Sm binding sites (green) are boxed, the stem-loop structure is indicated by arrows. 
(B) iCLIP profiles on the SL RNA. The 5' splice site (5'ss; after position 39) is highhghted by a red arrow, the Sm site boxed in green. (C) iCLIP profiles 
on the U6 snRNA. The highly conserved AC AG AG hexanucleotide, which interacts with the 5' splice site, is boxed in red. (D) Validations of UlC iCLIP 
tags for the Ul, SL and U6 snRNAs. Cell extracts were prepared from T. brucei wild-type (WT) cells and a cell line stably expressing PTP-tagged UlC, 
without (— ) and with (+) prior crosslinking by formaldehyde. Extracts were subjected to IgG pulldown of PTP-tagged complexes, followed by crosslink 
reversal. Copurifying Ul, SL and U6 snRNAs (as indicated on the right) were detected by RT-PCR (lanes P). For comparison, 1% of the total input is 
shown (lanes I). M, markers (100 and 200 bp; for panel SL: 100, 200 and 300 bp). 
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broad peak around the central loop of the SL RNA (nu- 
cleotides 79-82). However, some minor differences were 
also detected: U1-70K shows an additional peak at position 
67, which is absent for UlC, indicating a Ul-70K-specific 
contact with the SL RNA. 

We also mapped iCLIP tags on the U6 snRNA, where 
-following the Ul and SL snRNAs- most crosslink tags 
had been found (Figures 2B and 3C). In addition to the 
5^ terminal nucleotides, the two strongest peaks flank the 
highly conserved AC AG AG hexanucleotide of U6 snRNA 
(nucleotides 37^2), which interacts with the 5^ splice site 
during spliceosome assembly. Significantly, the U1-70K 
crosshnk sites concentrate directly upstream of the ACA- 
GAG box (around position 32). In contrast, only relatively 
few or no crosslinks could be mapped to the other spliceo- 
somal snRNAs, U2, U4 and U5. 

To validate these interactions, we used our procyclic 
T. brucei cell line, which stably expresses PTP-tagged 
UlC: RNA-protein and protein-protein complexes were 
formaldehyde-crosslinked in vivo, followed by lysate prepa- 
ration and purification of UlC-containing RNPs, based 
on the first step of the tandem-affinity purification. After 
crosslink reversal and RT-PCR assays, signals for Ul, SL 
and U6 snRNAs were detected, but not for U2 snRNA (Fig- 
ure 3D), confirming specific associations between UlC pro- 
tein and the Ul, SL and U6 snRNAs. In sum, our genome- 
wide iCLIP data provide direct evidence for physical links 
between c/^-splicing components (UlC; Ul snRNP) and 
the splicing machinery (SL RNA). 

The 5' splice site of m-spliced PAP pre-mRNA is recognized 
by UlC and U1-70K 

Because UlC is thought to be involved in 5^ splice site 
recognition, we next analyzed crosslink sites on the two 
c/^-spliced pre-mRNAs, PAP (Tb927.3.3160) and an ATP- 
dependent DEAD Box helicase (Tb927.8.1510). The well- 
characterized PAP gene was described as the first protein- 
coding gene in T. brucei that contains an intron and requires 
processing of its primary transcript through c/^-splicing 
(30). Clearly, both UlC and U1-70K binding sites cluster 
around the PAP 5' splice site (positions —1 to +3) (Figure 
4). A second, UlC-specific cluster maps to a more down- 
stream region (+15 to +19). 

For the second intron-containing gene, coding for 
an ATP-dependent DEAD Box helicase, only few UlC 
crosshnk sites were identified, probably due to its lower ex- 
pression levels. However, 5^ splice site interaction can be 
clearly seen for both Ul snRNP proteins (positions +11 
and +22 relative to the 5^ splice site) (Supplementary Fig- 
ure S6A). 

In sum, we conclude that the 5^ sphce site of both cis- 
spliced pre-mRNAs is contacted in vivo by both UlC and 
U1-70K proteins, and therefore most likely recognized by 
the entire Ul snRNP. 

UlC depletion decreases the efficiency of cw-splicing and be- 
comes essential under stress conditions 

To further evaluate the functional role of UlC during splic- 
ing, we silenced UlC expression by RNAi (Figure 5). Effi- 



cient knockdown was confirmed by RT-PCR detection of 
UlC mRNA, using 7SL as loading control; already after 
day 1, UlC mRNA levels decreased to half We also per- 
formed qPCR using the same primer pairs and observed 
a knockdown efficiency of approximately 80% after 3 days 
(Figure 5A). In addition, we checked for efficient depletion 
of UlC protein by western blot: after 3 days of knockdown, 
UlC protein was almost undetectable (Figure 5B). To ad- 
dress the question whether UlC is required for cell viabil- 
ity, we depleted cells of Ul C during a time period of 7 days. 
Surprisingly, no significant difference in growth between in- 
duced and uninduced cells could be observed, suggesting 
that UlC may not be essential for cell viabihty of procyclic 
T. brucei under the conditions used here (Figure 5C). To ex- 
amine whether this might be different under certain stress 
conditions, we exposed UlC-depleted trypanosome cehs to 
starvation stress and measured by their growth how they 
recovered (Figure 5D): cells, in which RNAi-knockdown of 
UlC had been induced, failed to recover from the starva- 
tion stress, whereas cells without nutrient starvation started 
to recover after 1 day. 

Next, we asked whether the steady-state levels of the snR- 
NAs are affected by the RNAi-mediated knockdown. UlC 
expression was silenced by RNAi for 72 h, and the steady- 
state levels of the SL, Ul, U2, U4 and U6 snRNAs were an- 
alyzed by northern blot hybridization during this time pe- 
riod; as an input control, ribosomal RNAs were detected 
by silver staining (Figure 5E). Neither the Ul snRNA nor 
any of the other snRNAs were affected by UlC knockdown 
(Figure 5E). In addition, we checked for the integrity of the 
Ul snRNP, using anti-UlC or anti-Ul-70K immunopre- 
cipitation (Figure 5F). In uninduced cells, the efficiencies 
of Ul snRNA immunoprecipitation by anti-UlC and anti- 
U1-70K antibodies were ~50 and ~10%, respectively. Upon 
UlC knockdown for 72 h, anti-UlC immunoprecipitation 
efficiency decreased to approximately 5-10%, whereas the 
corresponding value for U1-70K remained unchanged. We 
conclude that UlC protein depletion did not affect snRNA 
steady-state levels nor Ul snRNP integrity. 

To investigate whether UlC is essential for splicing in 
vivo, we analyzed cis splicing by semiquantitative RT-PCR, 
using various primer combinations to detect pre-mRNA 
and splicing products, and normalizing to U3 RNA ex- 
pression (Figure 6). Upon UlC knockdown we observed 
c/^-splicing defects for the PAP and the ATP-dependent 
DEAD Box helicase pre-mRNAs (Supplementary Figure 
S6B-D): Specifically, PAP pre-mRNA accumulated (Figure 
6B; lanes 1 /2), whereas c/^-spliced product clearly decreased 
upon UlC knockdown, indicating a block of c/^-splicing 
(lanes 3/4), consistent with the effect on mature mRNA de- 
tected by an SL-exon 2 primer combination (mRNA; lanes 
7/8, upper band). In contrast, no change in fra^^-splicing 
efficiency was detected for the same gene {trans-Ex\\ lanes 
5/6). Interestingly, if c/^-splicing was inhibited by RNAi de- 
pletion of UlC, the exon 2 of the PAP pre-mRNA was sub- 
jected to fraw^-splicing more frequently than in uninduced 
cehs (trans-Ex2; lanes 7/8, lower band). 

As an alternative way of inhibiting the activity of the 
Ul snRNP, we also used an antisense morpholino oligonu- 
cleotide (AMO) that specifically blocks the 5^ end of the 
Ul snRNA, as shown by its abihty to specifically select Ul 
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Figure 4. Crosslink site profiles of Trypanosoma brucei UlC and U1-70K on the poly(A) polymerase {PAP) pre-mRNA. The numbers of random-barcode- 
filtered iCLIP tags for UlC and U1-70K are plotted on the Y-axis with the corresponding crosslink sites. Shown are the exon-intron-exon region of the 
PAP pre-mRNA (top) and detailed information on the absolute number of CLIP tags at nucleotide resolution for the 5' splice site region (bottom; UlC 
above, U1-70K below the sequence); the arrow marks the 5' splice site (5' ss). 



snRNA in total RNA (Supplementary Figure S3A). Fol- 
lowing transfection of T. brucei cells with the Ul -specific 
versus a control AMO, we analyzed after a 6-h incuba- 
tion splicing of the PAP gene by RT-PCR, using differ- 
ent primer combinations (Supplementary Figures S3B-D). 
Upon AMO inhibition of the Ul snRNP, we observed splic- 
ing defects comparable to those seen after UlC depletion by 
RNAi: in addition to pre-mRNA accumulation, an increase 
of unspliced product was detected (5^ unspliced and cis un- 
spliced). In contrast to the RNAi effect, mature c/^-spliced 
mRNA signals did not change significantly, probably due to 
the relatively short period of AMO incubation. 

Taken together, it appears that surprisingly, RNAi- 
mediated depletion of UlC does not significantly affect par- 
asite growth under normal conditions, nor does it alter the 
Ul snRNP integrity; we observe only moderate effects on 
c/^-splicing, but no effect on normal SL fra^^- splicing. In 
contrast, recovery from starvation stress was clearly im- 
peded by UlC depletion, indicating that c/^-spHcing be- 
comes essential only under stress conditions. 

DISCUSSION 

To analyze RNA-protein interaction in the trypanosome 
system on a genome-wide level, we adapted the iCLIP ap- 
proach (23). The major change concerned the initial im- 
munoprecipitation stage, where we made use of the highly 
efficient tandem-affinity purification technology, combined 
with T. brucei cell lines that stably express PTP-tagged pro- 
teins of interest. In our experience such an initial two- 
step, high-affinity purification greatly helps to generate 



CLIP libraries with very low background, compared with 
standard, single-step immunoprecipitations. However, one 
should also consider that the two-step affinity purification 
can result in relatively low yields and therefore correspond- 
ingly low quantities of final iCLIP tags, in particular for 
low-abundance proteins. Here, we analyzed UlC and Ul- 
70K, two specific protein components of the Ul snRNP. Ini- 
tially, we focused on the UlC protein, which, based on stud- 
ies in other systems, is particularly important for the correct 
recognition of the 5^ spHce site (see 'Introduction' section for 
references). U1-70K, another Ul -specific component with 
a highly conserved RNA-binding site in the first loop of Ul 
snRNA, primarily served as an internal specificity control 
and for direct comparison. 

After evaluating iCLIP patterns for UlC and U1-70K, 
including additional control experiments, we conclude that 
the iCLIP approach reaches a certain limit, when analyz- 
ing relatively small RNPs with multiple protein components 
and protein-protein interactions, such as the trypanosomal 
Ul snRNP, with its 75-nts snRNA component and a total 
of 1 1 proteins. In such cases, fragmentation by RNase dur- 
ing the iCLIP procedure may not be sufficient and may re- 
sult in RNA fragments with more than a single polypeptide 
covalently crosslinked, up to the complete small RNP with 
several crosshnks. As a result of such partial or complete 
RNase resistance, we observe multiple peaks in the iCLIP 
profile. This reflects RNA-protein interactions throughout 
the Ul snRNP, due to coprecipitation effects and 'RNA 
linkers' between polypeptides that are too short for effec- 
tive fragmentation. For example, we see an additional dou- 
ble peak (nucleotides 50/53 of the Ul snRNA) around the 
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Figure 5. Effects of RNAi-mediated depletion of Trypanosoma brucei UlC on growth and Ul snRNP integrity. (A, B) RNAi-mediated knockdown of 
UlC expression. 24, 48 or 72 h after RNAi induction (as indicated), RNA was analyzed by semiquantitative RT-PCR (top) and real-time PGR (bottom). 
In addition, RNA from uninduced cells (tO) was included, and, as a control, 7SL RNA was measured from the same RNA samples (A). M, markers (100, 
200, 300, 400 and 500 bp). In parallel, UlC protein was detected by immunoblotting with affinity-purified polyclonal antibodies, using U2-40K protein 
as a control (B). (C) Growth curve of a representative clonal procyclic T. brucei cell line, in which RNAi against the UlC mRNA was induced. Cells were 
grown for 7 days in the absence (— Dox; gray line with triangles) or in the presence of doxycyline (+Dox; black line with crosses), counted every day, and 
diluted back to 2 x 10^ cells/ml. (D) Recovery of procyclic cells with or without induction of UlC RNAi from starvation stress. Cells were grown for 
1 day in the absence (—Dox) or in the presence of 1 mg/ml doxycycline (+Dox). As a control for growth effects of doxycycline, control cells (ctr) were 
grown for 1 day under the same conditions. 2x10^ cells/ml were collected, washed twice in phosphate-buffered saline (PBS), resuspended in the original 
volume of PBS, and incubated at 27° C for 2 h. After starvation stress, cells were harvested again, resuspended in the original volume SDM-79, and grown 
in the absence (—Dox; gray line with triangles) or in the presence of doxycyline (+Dox; black line with crosses). (E) UlC knockdown does not affect 
snRNA steady-state levels. From uninduced cells (tO) and 24, 48 and 72 h after silencing of UlC expression, equal amounts of RNA were analyzed by 
northern blot hybridization, using a mixed probe for U2, SL, U4, U6 and Ul snRNAs (top panel). As a loading control, equal amounts of RNAs were 
detected by silver staining (bottom panel). Positions of snRNAs, ribosomal RNAs and tRNAs are marked on the right. M, markers (in nucleotides). (F) 
Immunoprecipitation analysis of Ul snRNPs upon UlC knockdown. From uninduced cells (tO) and 72 h after UlC knockdown (t72), whole-cell extracts 
were prepared and subjected to immunoprecipitation (IP), using anti-UlC or U1-70K antibodies (as indicated). Copurifying RNAs were analyzed by 
northern blotting, using a mixed probe for U2, SL, U4, U6 and Ul snRNAs (positions indicated on the left). For comparison, 5% of each input is shown 
(lanes 'input'). 
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Figure 6. UlC depletion decreases the efficiency of c/^-splicing. (A) Schematic overview of the primer combinations used to detect PAP [poly(A) poly- 
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both c/5-spliced and c/^-unspliced products, as well as trans-splicQd mRNA (mRNA) and aberrant /ra?i5- splicing at exon 2 (trans-Ex2). (B) Inhibition of 
c/5-splicing by UlC knockdown. Total RNA from uninduced (-) and induced (+) cells after 72 h were analyzed by semiquantitative RT-PCR, using the 
primer combinations described in panel (A). As a control, U3 RNA was measured from the same RNA samples. M, markers (100, 200, 300, 400, 500, 600 
and 700 bp). (C) Quantification of RT-PCR reactions shown in panel (B). Error bars represent standard deviations from three independent experiments. 
*P < 0.05 and **p < 0.01 versus uninduced control. 



Sm site (5'-ACUUUG-30, most likely due to RNA contacts 
with one of the Sm polypeptides, which interacts with UlC, 
as shown for SmD3 in the human Ul snRNP (16). On the 
other hand, there are still protein-specific crosslink peaks, 
which are due to partial fragmentation of the RNP in this 
region, e.g. for U1-70K (at nucleotides 23/24 in the central 
loop). Finally, we cannot assign two other prominent peaks 
upstream (at nucleotides 15 and 39/40), which suggest addi- 
tional protein contacts, e.g. by the Ul snRNP components 
UlAorUl-24K. 

In spite of these intrinsic limitations, which should apply 
also to CLIP analyses of other small RNPs, RNA-protein 
interaction maps can be deduced, which allow insights into 
spliceosomal networks on the level of individual snRNAs 
and specific proteins. Due to spliceosomal dynamics, we 
have to be aware that we observe the sum of various con- 
formational states with different occupancies. Focusing here 
on UlC and U1-70K, we found -in addition to Ul snRNA- 
iCLIP tags also in the SL RNA and the U6 snRNA. 



We were very surprised that most of the U1C/U1-70K 
crosslinks on the SL RNA map to its 5^ splice site, because 
this suggests that the Ul snRNP participates in the recog- 
nition and/or activation of the 5^ splice site common to 
all ^raw^-splicing reactions. UlC may be directly involved 
in the SL 5^ splice site activation, and -because we see the 
same crosslink pattern for U1-70K- it is likely the entire Ul 
snRNP that interacts. Because -except for the Sm proteins- 
no other SL RNP proteins are known in trypanosomes, the 
other major RNA-protein contact on the SL RNA (nu- 
cleotides 79-83), detected for both UlC and U1-70K, most 
likely indicates a novel protein bound at the central loop 
and interacting with U1C/U1-70K. 

In the case of the two known c/^-spliced 5^ splice sites 
(PAP and DEAD Box RNA hehcase), the iCLIP crosslink 
patterns clearly peak in the first intronic positions of the 5^ 
splice sites. In particular, this was shown for both UlC and 
U1-70K sequence tags, which peak in the first intronic po- 
sitions of the PAP 5^ sphce site, suggesting a direct involve- 
ment of the Ul snRNP in 5^ splice site recognition. 
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Figure 7. Model of 5' splice site recognition in the PAP pre-mRNA and summary of U1C/U1-70K iCLIP tags. This schematic shows how the 5' splice 
site in the cis-intron of the poly(A) polymerase (PAP) pre-mRNA can be sequentially recognized by the Ul snRNA (5' terminal region) and the U6 
snRNA (internal region), based on the proposed Ul snRNA (27) and a hypothetical, extended U6 snRNA base-pairing (this study). Boxed are the exonic 
nucleotides of the PAP pre-mRNA and the conserved ACAGAG sequence of the U6 snRNA. The iCLIP-derived UlC/Ul-70K-crosslinked nucleotides 
in all three RNAs (Ul, U6 and PAP) are highlighted in red, to allow direct comparison with the base-pairing interactions. 



Finally, when comparing the iCLIP patterns for the snR- 
NAs and the PAP 5^ splice site, we note that the absolute 
number of crosslink tags differs within a range of three or- 
ders of magnitude (compare Figures 3 and 4). This can be 
explained not only by different expression levels of the re- 
spective target RNAs, but also by the transient versus stable 
nature of certain RNA-protein interaction and correspond- 
ing occupancy times. 

We summarize these crosslink characteristics for the PAP 
cz^-intron, the Ul and U6 snRNAs in a model, focusing on 
U1C/U1-70K and taking into account our knowledge of 
spliceosomal dynamics established in the yeast and mam- 
malian systems (see Figure 7). Early in spliceosome assem- 
bly, the Ul snRNA extensively base-pairs with the PAP 5^ 
splice site, as proposed by Mair et al (30) and supported 
here by directly adjacent crosslink positions in the PAP in- 
tron and the Ul snRNA (see nucleotides marked in red). 
Later, U6 snRNA replaces Ul, and we propose this ex- 
tended base-pairing, based on the classical register between 
the U6 ACAGAG sequence and the first intronic positions 
of the 5^ splice site. The crosslink patterns derived from our 
iCLIP data and summarized in Figure 7 support that such a 
sequential, multiple 5^ splice site recognition and Ul-to-U6 
'handing-over' operates also in the trypanosome system. 

A surprising implication of our first iCLIP study in the 
trypanosome system is that it suggests a physical linkage be- 
tween cis- and frazil- splicing, in particular the Ul snRNP 
and the SL RNP: the 5' splice sites of both the PAP cis- 
intron and the fra/z^-spliceosomal SL RNA are contacted by 
UlC and U1-70K. Based on this, the most provocative hy- 
pothesis would be that the Ul snRNP participates in activa- 
tion of the SL RNA 5^ sphce site. This would be in contrast 
to results in the nematode system, where Ul snRNA ap- 
pears not to be essential for splicing: first, knockout of 
over 90% of Ul snRNA did not affect fra/z^- splicing in vitro 
(31); second, only traces of Ul snRNA were detectable in 
purified trans spliceosomes, probably due to cryptic 5^ splice 
sites (32). Third, as proposed by Bruzik et al. (33), the SL 
RNA may use its base-paired structure around the 5^ splice 
site to be activated in a Ul -independent manner. However, 
note that we cannot be certain that the nematode results ap- 
ply to the trypanosome system, with its highly divergent Ul 
snRNA and very few cis introns. 



On the other hand, the SL RNA is clearly present in 
both nematode cis- and rra/i^-spliceosomes (32), suggesting 
a common c/^/^ra/i^-spHceosome, a close linkage between 
the two types of splicing reactions, and that the decision be- 
tween cis- and ^ra^^-sphcing occurs after spliceosome as- 
sembly. We consider such a scenario likely to operate also 
in trypanosomes, and our result on the competitive usage 
of the 3^ splice site in the PAP intron supports this notion: 
upon UlC knockdown we detected a partial switch of the 
PAP 3^ splice site from normal cis- to aberrant ^ra^^-splicing 
(Figure 6B, lanes 7/8). 

Why did we not observe a severe growth defect after UlC 
(Figure 5) nor U1-70K knockdown (Supplementary Fig- 
ure S4) under 'normal' conditions? First, the Ul snRNP 
and the UlC protein are apparently not essential for trans- 
splicing in trypanosomes, at least under normal conditions, 
consistent with earlier nematode studies (see above). The 
U1C/U1-70K contact with the SL 5^ splice site we described 
may be functionally relevant only under certain growth con- 
ditions, such as during stress, or for a subset of genes. How- 
ever, a high-throughput phenotype analysis revealed that 
for either of the Ul snRNP-specific proteins UlC and Ul- 
70K no significant loss of fitness was observed upon knock- 
down in the different life cycle stages (34). Alternatively, it 
may simply represent a minor specificity or efficiency de- 
terminant not significantly relevant for normal growth nor 
splicing activity under our experimental conditions. 

Second, regarding the two cis-intvons of T. brucei, their 
splicing efficiency was clearly affected upon UlC knock- 
down, although only moderately, consistent with an ear- 
lier report on U1-70K knockdown (19). The intrinsic in- 
efficiency of c/^-sphcing in trypanosomes can explain the 
moderate extent of this effect: already under normal growth 
conditions a high number of intronic reads was measured 
by RNA-Seq, reflecting unsphced, intron-containing pre- 
mRNA (Supplementary Figure S5). Why did we not de- 
tect any significant defect in growth after knockdown? We 
suggest that there is redundancy among several paralo- 
gous poly(A) polymerases in the genome, or that even the 
intron-containing PAP (Tb927.3.3160) may not represent 
the major functional gene responsible for mRNA 3^ end 
processing. The same argument holds for the second intron- 
containing gene (Tb927.8. 1510), coding for a putative RNA 
helicase of unknown functionality and specificity. This hy- 
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pothesis is further supported by the high-throughput RNAi 
analysis of Alsford et al , in which knockdown of neither of 
these two genes showed a severe effect (34). 

However, when exposing UlC-depleted cells to starva- 
tion stress, a severe growth defect was observed, indicating 
that UlC and c/^-splicing become essential in the response 
to nutrient starvation. 

Ultimately, this points to the possibility that c/^-splicing 
of the two introns itself may not be required for growth 
of the parasite under 'standard' conditions. For both 
intron-containing genes functional intronless putative 
counterparts may exist (for the poly (A) polymerase: 
Tb927.3.3160/Tb927.7.3780; for the ATP-dependent 
DEAD Box helicase ATP: Tb927.8.1510/Tb927.10.6630), 
so that the spliceosomal Ul snRNP may represent an 
evolutionary relic. However, this does not exclude that the 
two introns may fulfill an unknown important function, 
and that other Ul RNPs with variant protein compositions 
play roles beyond splicing. 
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